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ABSTRACT 

Reconstructing the matter density field from galaxy counts is a problem frequently addressed 
in current literature. Two main sources of error are shot noise from galaxy counts and in- 
sufficient knowledge of the correct galaxy position caused by peculiar velocities and red- 
shift measurement uncertainty. Here we address the reconstruction problem of a Poissonian 
sampled log-normal density field with velocity distortions in a Bayesian way via a maximum 
a posteriory method. We test our algorithm on a ID toy case and find significant improve- 
ment compared to simple data inversion. In particular, we address the following problems: 
photometric redshifts, mapping of extended sources in coded mask systems, real space recon- 
struction from redshift space galaxy distribution and combined analysis of data with different 
point spread functions. 

Key words: large-scale structure of Universe - dark matter - methods: data analysis - tech- 
niques: photometric 



1 INTRODUCTION 

Correct and thorough signal analysis is vital in cosmology. This 
is for one thing due to often low signal-to-noise ratios of many 
cosmic measurements. An important fact to notice here is that many 
measurements can not be independently repeated, as nature only 
grants us with one realisation of the data such as the CMB or the 
large-scale structure of the universe (LSS). 

Since the complications in extracting the desired information 
from data are so fundamental, it is often not possible to draw sens- 
ible conclusions from it without some knowledge on the properties 
of the underlying signals. Although it would be desirable to be com- 
pletely independent and prejudice-free, one typically has to give up 
on some freedom in the analysis by restricting to a specific model. 
In return one gains much better constraints on the measured quant- 
ity. Hence a thorough signal analysis must take care of all those 
aspects, otherwise the wrong conclusions could be drawn. This is 
most naturally done in the Bayesian framework where all variables 
are considered to be subject to error and variance. 

The emphasis in this work is on the reconstruction problem 
of a log-normal density field that is sampled via a Poissonian pro- 
cess as simple description of galaxy formation and a number of 
other processes, like a highly structured gamma ray emissivity. We 
choose the log-normal field, because we think it suited to model the 
dark matter density distribution of the Universe (see section [3". l.l[ l. 
In addition, we extend the problem such that the signal is spatially 
distorted and galaxy counts from one position may show up at other 
locations. This way a sharp peak in the signal can show up as a 
broad distribution in the data depending on the underlying distor- 
tion process. This allows us to address the real space reconstruc- 



tion problem of the dark matter density field from redshift distorted 
galaxy counts and to naturally incorporate photometric redshift er- 
rors in our analysis, to name a few. 

The most generic case of uncertain position is the measure- 
ment error from the measurement apparatus itself which comes 
with any measurement. In many cases, these errors form a Gauss- 
ian distribution around a mean value. But there are also other 
cases like photometric redshift where due to the measurement tech- 
nique there is considerable chance for 'catastrophic outliers' which 
leads to non-Gaussian probability distributions. In our analysis the 
chance for such catastrophic errors can be naturally included and 
dealt with. This permits do deal with cases where such outliers 
are the rule, for example coded mask detectors in X- and 7-ray 
astronomy where a point source has to be identified via its com- 
plex mask shadow on the detector plane. Other areas where spatial 
distortion should be included in the analysis are 7-ray astronomy 
via Cerenkov telescopes, or Ultra-high-energy cosmic ray detect- 
ors due to the extended point spread functions of the measurement 
devices. 

In all examples so far the distortion of the data was known a 
priori and was fixed. However, one can go one step further and al- 
low the distortion to depend on the signal itself that one wants to 
measure. The paradigm for this problem is the measurement of red- 
shift in galaxy surveys. Here the aim is to measure the real space 
density distribution of matter in the Universe via galaxy counts, but 
since the presence of matter has an effect on the peculiar velocities 
of the observed galaxies, the distortion depends on the details of the 
signal to be reconstructed. Beyond the linear regime, where a dark 
matter halo has collapsed to a virialised object like a galaxy cluster, 
the galaxies have large peculiar velocities. Since only the compon- 
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ent along the line of sight adds to the redshift, those collapsed ob- 
jects appear as dense elongated structures in redshift space pointing 
towards the observer, therefore this is also called the "finger-of-god' 
effect. 



Including the feedback of matter on the redshift space distor- 
tions is the ultimate goal of LSS reconstruction. The point of our 
work is to address one very important effect so far often ignored in 
statistical inference of the LSS: spatial distortions. In order to focus 
our discussion on this, we work with simplified descriptions of the 
complex galaxy formation process. The adopted description, how- 
ever, was previously shown to provide good reconstructions despite 
its simplicity. 

Significant progress in the field of large-scale structure recon- 
struction with a log-normal model for the dark matter over-density 
has been achieved in a number of recent works. [Enfilin et af]j2009^ 
derive the MAP estimator and loop corrections thereof within the 
information field theory (IFT) framework. Kitaura et al.H2010) suc- 
cessfully apply the MAP Poissonian log-normal filter on mock data 
from N-body simulations. [Jasche et al.|(2009^ even achieved a re- 
construction from real world SDSS 7 data going beyond the MAP 
approximation by using a Hamiltonian sampling method. However, 
all these approaches do not take the spatial uncertainty of redshift 
measurements - i.e. the point spread function (PSF) - into account. 
The complications from a non-trivial PSF has been addressed in a 
number of works with similar data models as ours. |Hebert & Leahy| 
l fT992) ; [Greeri|(T990l >; |Wang et al.| »2008) consider a similar data 
model but use a signal clique prior adequate for image reconstruc- 
tion. [Oh|^^neden]|2009} have a distorted Poissonian data model 
but use a smoothing prior based on Fisher information for the sig- 
nal. |Nunez & Llacer| ( |1990 t also work with a Poissonian data model 
but use an ad-hoc image entropy as prior for their signal. [Rieden &| 
|Wells| ( |1978) ; |Gull & Dan iell ( 1978 1 work with maximum entropy 
prior for the signal distorted by a point spread function and approxi- 
mate the Poissonian distribution by a Gaussian. Different choices 
for such entropy priors are discussed in [Nityananda & Narayan| 
( [T982) ; [Comwell & Evansl ( [T985^ 

The outline of this work is as follows. We formulate the 
Bayesian reconstruction problem in a field-theoretic language in 
section|2] In section[3]we address the reconstruction problem from 
spatially distorted data. A suitable data-model for this purpose is 
introduced. In particular this includes a discussion of the distor- 
tion operator in section [J. 1.2.3 1 We address how approximate error 
bars can be constructed for our reconstructions. In section [331 we 
apply our method to the LSS reconstruction from photometric red- 
shift measurement in a ID test-case. We show how data sets with 
different error characteristics can be naturally combined within our 
framework in the subsequent section. In section [375] we apply the 
same algorithm to a completely different field, namely X- and 7- 
ray astronomy via coded mask telescopes. Finally, in section [376[ we 
formulate a distortion problem where the distortion itself depends 
on the signal to be reconstructed. We therefore propose a model 
how to approach the forward problem to transform from real into 
redshift space. We compare our results for this distortion model to 
a Metropolis-Hastings sampling method in section [3.6.4| In [4] we 
summarise our findings. Details about the notation can be found in 
[A] 



2 THEORETIC FRAMEWORK 

2.1 The way from data to information 

There is always a difference between the data that we measure and 
the information we want to extract from the data. The process from 
mere data acquisition to information extraction needs a model for 
the way how data is produced from a signal s. In a practical applic- 
ation one measures data d and tries to infer the signal that produced 
the data. In most cases this data inversion is not unambiguous when 
e.g. the signal s is continuous and the data d are discrete. Then, one 
needs to formulate the data model as probability distribution func- 
tions (PDF) such as P{d\s), P(s), P{d), P{s\d) which we call 
- following standard Bayesian naming conventions - likelihood, 
prior, evidence and posterior, respectively. 

This model may include physical and technical processes all 
the like, such as noise from the measurement system or natural and 
unavoidable sources of error. Noise in particular makes the map- 
ping from signal to data ambiguous and non-deterministic such that 
different signals can produce the same data. 

Although it was shown that it is in principle possible to extract 
prior information on the signal s from the data (Enfilin & Frommert[ 
[2010[ l, we will always assume that P{s) is given in advance. In 
many situations, the prior is taken to be flat with the intention to 
be 'prejudice-free'. However, the flatness of P{s) depends on the 
coordinate system chosen for s and is therefore a (hidden) choice 
for a specific coordinate system. Hence working with an explicit 
prior should not be seen as a flawed bias for a specific model but as 
a complete discussion of the problem. 

2.2 Optimal map malting 

Unfortunately, in most situations the process of data generation 
from the true signal st is not reversible which means that some in- 
formation is irreversibly lost and St can not be fully restored from 
the data. So the task of signal inference from a data set d is to pro- 
duce a map m as an approximation of st provided the likelihood 
P{d\s) how data emerges from the signal and the prior P{s). A 
reasonable map making strategy is to minimize the average error 
that one makes in the reconstruction of st. Here it is of key import- 
ance how the errors are weighted. A suitable measure for the error 
is the L2-norm defined as 

||s||2 = {^j dx\s,,f^ ' (1) 

where the appearing integral is a volume integral over the whole 
position space. 

Minimizing the expected error (||s — "^lla)^^!^) provides the 
prescription 

= {sa;)(^l^) = Jvs Sx P{s\d), (2) 

for map making, where the subscript x refers to the point or pixel 
whose estimated map value is to be calculated. Since s is a field, the 
Os-integral is a path integral which runs over all possible field con- 
figurations. Usually one has to take thorough care about the conver- 
gence of path integrals, but in all practical applications one deals 
with a finite number of signal points or pixels, such that the path 
integral reduces to a volume integral over a rather high but finite 
dimensional space. 

Of course, different Lp-norms lead to different map making 
techniques. Thus it is a question of taste and belief which Lp-norm 
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one prefers and therefore which map making technique one con- 
siders best. However, since we know that the reconstruction cannot 
fully recover the exact signal, it makes sense to be generous to small 
deviations but penalise large ones strongly, which is exactly what 
the La-norm does. 

As a word of caution, it should be mentioned here that al- 
though |2| is independent of the coordinate system chosen for the 
data d, it does depend on the coordinates of s, however. Thus we 
need to specify in which signal coordinates we want to minimize 
the reconstruction error. 



2.3 IFT formulation of moment calculation 

Moment calculation can also be formulated in the language of stat- 
istical field theory and since we are dealing with information here, 
we call this framework information field theory (IFT). This was ad- 
dressed by e. g.|Enfilin et al.|(|2 0Q9'l; Bialek et &\\ ( [1996) ; |Lemm| 
( [T9991 [20UTI 1; |Lemm et al.| \2QQ \ 2005 Ibut for completeness, we 
briefly summarise their findings. Bayes' law allows to rewrite (|2j 



{s\d) 



P{s)P{d\s) 
' P{d) 



Vss. 



P{s, d) 
■ P{d) ■ 



(3) 



As the set of all possible signals s is exclusive and exhaustive, one 
can marginalise P{s, d) over s 



P{d) = / VsP{s, d). 



(4) 



So it turns out, that the only required probability distribution for 
moment calculation is P{s, d). 

The crucial step now is to realise that the above integrals can 
also be formulated in the language of statistical field theory as 
e.g. |Lemm| ( [T999l >, [EnBim et al. ( 2009 ) , Jaynes l (TWh and others 
proposed. Following amongst others the idea of Metropolis et al. 

I953) ; |Hastings| (T970]l; |Geman & G"eman| ([l984jl; |Duane et al. 

1987 I, we define a probability Hamiltonian as 



H. 



log P(s, d) 



which leads to the IFT equivalent of the partition function 



Vse 



(5) 



(6) 



Comparing ^ with ^ one recognizes that P{d) = Zd- So the 
posterior can be expressed as P{s\d) — e~^''^^^ /Zd- 

The full advantage of this formulation becomes evident when 
the posterior is or is approximated by a Gaussian, i.e. 



P{s\d) = g{s-m, D) 



^-{s-mfn ^(s-m)/2 ^-y^ 



i27rD|i/2 

Then one can analytically calculate the generating functional 



Zd[J] 



Vse 



-Ha[s] + J^8 



(8) 



where J is an arbitrary field that we call source field in analogy to 
quantum field theory. From Zd[J] all moments can be calculated 
by functional derivation: 



{^xi ' ' ' ^Xji ) 



(s|d) 



(9) 



Zd SJxi ■ ■ ■ SJx 

This result will be useful in section 13.2.31 where we address 
error estimation. 



2.4 MAP approximation 

For many applications calculating the partition function is not feas- 
ible, because the joint probability is too complex that the gener- 
ating functional can be calculated analytically. In these cases one 
has to resort to an approximation of {s)^^!^) such as the max- 
imum a posteriori (MAP) map rrici, which approximates the pos- 
terior mean by the map that maximises P{s\d). The evidence 
only serves as normalisation constant, so maximising the posterior 
comes down to maximising the joint probability of signal and data. 
And since P ( s , d) can be expressed in terms of Hd [s] , maximising 
P{s, d) = e"^'''"' is - due to the monotony of the exponential 
function - equivalent to minimizing the Hamiltonian. Minimizing 
the Hamiltonian is a well-known principle from classical mechan- 
ics so it makes sense to call the MAP map the classical map. 

The posterior average (s)(g|^) and rrici are exactly equal, if 
P{s — mct\d) is a symmetric and singly peaked function. In the 
more interesting cases however, this is usually not the case. Yet 
i^){s\d) ~ '^c' often holds when P(s|d) is sharply peaked. While 
the posterior mean minimizes the expected L2 -distance from re- 
construction to the true signal, one can show that the classical map 
minimizes the Lo -distance. 



3 RECONSTRUCTION OF SPATIALLY DISTORTED 
SIGNALS 

3.1 Signals with uncertain position measurement 

The reconstruction of a signal from data with distorted or imprecise 
position information is a generic problem class. 

By signal we mean a distributed physical quantity, i.e. a field. 
In principle, the signal must be considered to be continuous but 
for most practical applications it must be sampled, which yields a 
field vector s. Therefore, we assume from now on that s is indeed 
an ordinary vector from a high dimensional real vector space. We 
also call this space of all possible signals the signal phase space, 
its elements s a signal realisation and the value of s at location i 
the field strength Si. 

The signal can be probed by a measurement apparatus which 
ultimately produces data which are discrete by nature. From this 
alone it is clear that data and signal are a priori defined on different 
vector spaces. In the following, the data will always be the count 
rates of events. Examples for these events could be the detection 
of galaxies in some direction with redshift in a specific range or 
the registration of photons in a X-ray detector. The number of such 
events as a function of our data space coordinates then form a data 
vector. 

The response p of an experimental set-up is by definition the 
expectation value of the data d averaged over all possible data real- 
isations: 



M = id)(du 



(10) 



In other words, the response is perfect noiseless data. We say that 
the response is local, when the field strength Si triggers only events 
in one data bin dj . If on the other hand Si may trigger events in a 
number of data bins dj^ , dj^ ... we say that the response is non- 
local. If in addition the same data bin dj may be filled by different 
Si J , we call the data space distorted. We address both the prob- 
lems where the distortion is independent of the underlying signal 
but also the case where the distortion does depend on the signal. 
In this work we always assume that the signal has a log-normal 
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PDF with known covariance. We choose this signal, since we be- 
lieve that it approximately models the large-scale matter distribu- 
tion and other signals. 



3.1.1 The log-normal distribution for matter 

There are several good reasons why to believe that the log-normal 
PDF models the large-scale matter distribution well. For one thing 
it has already been used by |Hubble| ( |1934[ l as early as |1934[ to suc- 
cessfully model the galaxy count rates in 2D sky patches. For an- 
other [Coles|&Jones]([T99T]l found that if an initially Gaussian ran- 
dom field, as it is predicted by most inflationary models and more 
importantly as it is observed in the CMB, evolves over time and 
the peculiar velocities grow linearly, then initial Gaussian field is 
evolved over time into a log-normal field. 

There is also evidence from N-body simulations that the log- 
normal PDF is an adequate description of the large scale matter dis- 
tribution. |KayoeFaL]([200T]l found by direct comparison of the one- 
point and two-point correlation functions obtained from N-body 
simulations to the one-point and two-point log-normal PDF that 
the former can be very accurately modelled by the latter even in the 
strongly non-linear regime with overdensities up to 100. |Kitaura| 
[eTaTlpOTo) could even show with mock tests that their Poisson- 
ian log-normal model was able to reconstruct the matter distribu- 
tion accurately for overdensities up to 1000. Furthermore, |Neyrinck| 
|et al.| j2009| > show also with data from N-body simulations that 
a log-normal density field fits the density power spectrum much 
better in the slightly non-linear regime than Gaussian fields do. 
In both of the above works, the N-body simulations were seeded 
with small Gaussian density fluctuations. Recently, the log-normal 
model has also been successfully applied to matter density recon- 
struction from SDSS data jJasche et al.||2009l |Jasche & Kitaura] 
[2009). 

And last but not least: the log-normal PDF is mathematically 
simple and therefore comparatively easy to handle. 

Hence we assume that the matter density p in the universe can 
be modelled by 



Poe 



(11) 



Here po is a reference density, close to but different from the mean 
density, the exponential function is meant to be taken component- 
wise and the signal s shall be a field with Gaussian covariance mat- 
rix S, i.e. P{s) — G{s, S). Throughout this work we will always 
assume that s is a Gaussian field with covariance matrix S, if not 
stated otherwise. 

One characteristic of the log-normal distribution is that it gen- 
erates very large overdensities in the case of large S, while the inner 
structure of void regions is barely visible to the eye. Besides, the 
log-density field contains information about the primordial density 
fluctuations. Therefore, instead of reconstructing the density field 
itself, we reconstruct the log-density field s — log(p/po) which is 
Gaussian for the log-normal distribution. 



3.1.2 The data model 

3.1.2.1 The local response: As argued in section |2?T] the data 
model must provide a likelihood to obtain some data d given some 
signal s. Since only a minor portion of matter radiates, we have to 
rely on tracers of matter density. It is widely accepted that galaxy 
count rates can serve as tracers for the dark matter density field on 
large scales. 



As a first step we have to find a formulation for the local re- 
sponse to the signal. Therefore, we subdivide the observed space 
into boxes of volume Vi and relate /li, the expected number of 
events within Vi, to the field strength Si, the signal strength av- 
eraged over Vi . We assume the local response p of the observation 
to be given by 

b 



^J.^\s] 



a 1 



(12) 



Here w is the window function which encodes information on the 
exposure time and survey geometry that might have an impact on 
the detection probability in Vi and pi"' is some reference event 
density. For brevity we have defined Ki — Wi Vi pi"' which we call 
zero-response, because p[s = 0] = k. Throughout this work we 
always assume that k is known a priori. 

The bias b determines, how fast and how strong overdensities 
produce events. There are strong reasons to assume that a single 
linear bias factor is an oversimplification of nature as e.g. [Fry &| 
|Gaztanaga| ( [T993) ; patsutoa|j2008b"lal l; |Jeong & Komatsulpnogf 
show. Nevertheless, for a proof-of-concept at which we aim the 
single bias factor simplifies the set-up and preserves the relevant 
features of more elaborate models. Besides, a scale-dependent bias 
for s can be incorporated in the covariance matrix by working in 
the Fourier picture f Kitaura et al.|2010) . For 6^1 the response 
can be expanded as a power series in s 



p, [s] 



J=0 



+ bSi) 



which is the familiar form of a linear bias model used in galaxy 
cosmology. However, when b approaches unity for a signal with 
0(1) variance, higher orders of s become more and more important 
and the linear approximation is bound to fail. 

It must be stressed, that pi'^' is not the average event dens- 
ity, b ecause pe,i = {pi)(s) = «i f'Dse^'''g{s, S) = me'''^"^^ 
(e.g. |Coles & Jones|1991[ [Kayo et al.||2001^ . When setting up a 
simulation one usually specifies pe and notpj'' , so one has to keep 
this relation in mind when the bias is varied. 

3.1.2.2 The Poissonian sampling process: So far, we have only 
defined the response, which specifies how many events one can ex- 
pect on average in Vi provided the signal s. But since the number 
of events in Vi is always a (non-negative) natural number, one gets 
the true response only if the events are averaged over many differ- 
ent data realisations. However, in many cases nature prohibits this 
practice, because the number of events di is a random number with 
expectation value pi that is not redrawn between observations, such 
as the random number of galaxies in Vi. Therefore, the noise inflic- 
ted by the inaccurate approximation of the response by the events 
must be included in the data model. 

The Poissonian PDF describes the number of events when the 
number of possible outcomes of an experiment are vast, but only a 
few counts are expected. This applies to the case for the observed 
number of galaxies in a box V where we have only the information 
of the expected number of enclosed galaxies. Of course it would 
be desirable to include more knowledge about the environment of 
a galaxy into the PDF for their abundance as many semi-analytic 
models do (e. g. [White & Frenk' 1991 ; Kauffmann et al.|1993[[EFl 
|stathiou|2000[|CoIe eta l. 2000i. Many N-body simulations indicate 
that there are numerous parameters that could or should be taken 
into account (e.g. [Navarro et al.|1996{|Springel et al.|2005[|Jenkins| 
|et al.|200T[ l. However, adding more complexity makes the problem 
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even harder to tackle than the Poissonian PDF. Besides, due to the 
generic nature of the Poissonian PDF, the method can also be ap- 
plied to completely different problem class, such as the shot noise 
of X-ray photons in a detector 

We now make the assumption, that di, the number of events in 
Vi, is independent in its noise properties from all other data bins. In 
these assumptions we follow the works of Layzer ( 1956| l; [Peebles| 
( [T980t ; [Martinez & Saar| l |2002> ; |Kitaura et al.| ( |2009F ! 

Thus the likelihood of the data becomes 



p(diM) = n 



di - 

Mi e 



(13) 



As /i can be expressed in terms of s, this provides the desired likeli- 
hood P{d\s). The data d can also be viewed upon as the Poissonian 
sampled response. While /i itself is by definition noiseless, every 
realisation d has Poissonian noise. 

3.1.2.3 The distortion operator: So far, the response (12) is 
entirely local. We now propose the concept of a distortion operator 
which naturally introduces non-locality in the response ^12\ . 

Since it is the data space that is distorted, one should apply the 
distortion to the local response ([12}. Hence we define the distorted 
response 



Hi 



j 



(14) 



with a distortion matrix R. 

It is a reasonable requirement that the distortion matrix should 
preserve the expected number of events, i.e. 



(15) 



The most natural explanation of R is to interpret R^j as the prob- 
ability that an event which occurred at position Xj in signal space is 
observed at location Xi in data space and increases d,;. This is also 
how we set up R in practice and it is easy to show that a distortion 
set up in this way fulfils the 'conservation of /i' ^T^. Note that the 
factor K can be absorbed into R which we will assume from now 
on. Then, Tlix has the meaning of the rate of events in data bin i 
per e' in signal space. 

It is now in order to summarise in brief our data model. 

• The log-density signal is a Gaussian field with covariance mat- 



rix S, such that P{s) 



|27rS|l/2 



The matter distribu- 



tion is modelled by a log-normal PDF, such that the density in Vi is 

pi = poe^ 

• The response is given by fJ-i = ^ Rije' . 

• The data likelihood given the signal is P(d|s) — Yl^ ^' — . 

It is evident that the complexity of this system strongly depends 
on the structure of R, and in particular if R itself depends on the 
signal s to be reconstructed, or not. 

The local data model with R = k*1 has been solved in 
a fully Bayesian framework by |Enfilin et al.| ( [2009^ . Implement- 
ations thereof in the field of large-scale structure reconstruction 



Kitaura et al. 


(2009}, 


Kitaura et al. 


(2010^1 and Jasche et al. 



l |2009^ ; [jasche & Kitaura| ( (2009^ , respectively. There are of course 
also other survey based reconstructions of the large scale structure. 



Such can be found in Bertschinger et al. (1990); Yahil etal. (1991 
fShaya et al.|(fT995 1; Branchini et al. ( 1996); Webster et al. ( 1997 
Schmoldt et al. (1999); Nusser & Haehneh (1999); Hoffman & 



|Zaroubi|(|2000p;|Goldberg| ( |2001p ; |Erdogdu et al.|(|200 4); Vogeley 
|et al.|(|2004| ); |Huchra et ai^ ( [2005} ; |Fercival| ( |2005}nErdogdu et al. 
I2OO6 . 



3.2 Setting up a reconstruction problem 

In order to test the map making techniques on the log-normal dens- 
ity field with Poissonian noise, we generate mock data from a 
known signal and try to reconstruct the signal. Aiming at a proof 
of concept, it is in order to keep the complexity of the numerical 
simulation low, hence we restrain ourselves to a ID case. This re- 
striction is not fundamental, because the same algorithm can also 
be applied to higher dimensional problems. In a sense the topolo- 
gies of the spaces involved are encoded in S and R, but as we will 
see below the inner structure of S and R are a priori irrelevant for 
our method on an abstract levej^ The only downside of multidi- 
mensional reconstructions is that the dimension of the field vector 
space grows like A'^^ where A'^ is the number of pixels along the any 
direction. So even moderate resolutions have severe impact on the 
performance of any calculation and demand extensive optimisation. 
Therefore we contend ourselves with A'^ pixels in one dimension for 
this conceptual work. 

In order to be free of boundary effects we choose a ring-like 
topology for our reconstruction, such that the first and the last pixel 
are direct neighbours. This ring-like topology is also not necessary 
since boundary effects can be treated naturally, but it simplifies the 
set-up. 

The first step is to choose an adequate power spectrum for 
the signal. Since we are mostly aiming for the principles of re- 
constructions with spatially distorted data, the details of the power 
spectrum are not essential here as long as most of the power is 
concentrated on the large scales. Inspired by the Yukawa potential 
from quantum electro-dynamics ( (Peskin & Schroeder 1995 ) which 
mediates a force with a range of Ic or - in a different language 
- introduces correlation with a characteristic length scale Ic we try 
P{k) cc j^2^i-2 ■ However, in order to have a more structured look- 
ing signal, we introduce a boost term which selectively amplifies 
power on large scales: 



P{k) 



0.2 + e~"' 

fc2 + 



(16) 



All reconstructions in this work were done for a correlation length 
Ic = 0.05 L where L is the length of the simulation interval. 

As the power spectrum is the diagonal of the Fourier transform 
of the covariance matrix, this allows to easily compute 



dk e 



2m{xk — yk) 



P{k) = [P] {x - y) 



(17) 



Here J" ^ [/] (a;) must be read as the inverse Fourier transform of 
/ evaluated at x and P{k) stands for the matrix with P{k) on the 



^ The authors of this work use a mathematically similar response fi = 
k(1 -I- 6(6" — 1)) that could also be used to model spatial distortion. 
Unfortunately, the authors do not mention the interconnection between their 
general scale-dependent bias B and a point spread function. 



^ It may however have an impact on convergence speed, but these details 
are left for further investigation. 
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diagonal. Similarly, we can compute 

S^^; =:^"'[1/P] (x-y),and 



(18) 
(19) 



which we need for the evaluation of Hd and to generate mock sig- 
nals, respectively. In Appendix|B]we describe how a random signal 
with given covariance matrix S can be constructed from a vector of 
random numbers using T, the root of S. 



3.2.1 Naive data inversion 

As competitor for the fully Bayesian reconstruction, we also con- 
struct an unoptimised filter, namely a non-Bayesian reconstruc- 
tion, that we call m„a. The naive data inversion is not as straight- 
forward as it may seem at first sight. Although the performance 
of nina is extremely poo^it is still worthwhile to investigate the 
problems of direct data inversion in order to understand why this 
approach is doomed to fail. 

In principle, naive data inversion seems sensible, as the data 
d trace the response /i which can directly be inverted to yield s. 
However, d is not exactly /i because of the Poissonian sampling of 
the response. As a non-Bayesian analysis neglects the signal co- 
variance for m„a, no information on the true underlying /ii can be 
inferred from adjacent pixels. Hence there is no way to tell if the 
data point di is lower or higher than the expected /li, and the only 
choice is to make the approximation fii ^ di. Furthermore, when 
the distortion depends on the signal, there is no way to consistently 
incorporate bin mixing in the data inversion, such that one has to 
approximate the distorted response by the local response \12\ . This 
leads to 



(20) 



However, this is only possible for pixels with di > 0, and a generic 
guess such as fhnaid = 0) = for those pixels cannot be correct, 
as for high galaxy densities with Wi Vi pi^^ = Ki > 1 the estim- 
ated signal is negative even for di — 1. A way out is to allow for 
Bayesian reasoning in this case and ask for the most probable field 
strength Si given its variance and di = This leads to 



(21) 



where W(z) is the Lambert W-function (i.e. the root of z = w e^). 
As it stands m„a is however a bad estimator for the signal as the 
noise from the Poissonian sampling is very present and the true 
signal is known to be smooth. 

It is common practice to apply a smoothing procedure to the 
inverted data. However, the shape of the smoothing kernel intro- 
duces new free parameters and there is no generic setting for it in 
non-Bayesian reasoning. Allowing for the use of the covariance S 
gives again a back-door, as it can be used to smooth the inverted 
data. We choose to convolve with T, the root of S, as it is more 



^ Therefore the subscript 'na' for naive 

* The careful reader may argue, why to distinguish between the two cases 
di = and di ^ 0. The same argument as for = can also be 
applied to 7^ leading to the similar result m„a,i = dibSa — 
^ W (^Kib^Sii o'*'* However, in most situations this result is very 

close to \20\ , so we decided to stick to the more data-oriented non-Bayesian 
solution. 



localised and therefore the convolved map reflects the data better. 
Hence the final non-Bayesian map is given by 



= Tm„ 



(22) 



Even though m„a gives really poor guesses for the underlying sig- 
nal s, our tests have shown that smoothing with a top-hat filter per- 
forms even worse. This also justifies our choice of T as smoothing 
kernel. 

We also like to stress, that even this naive map construction 
had to rely on some Bayesian elements in that signal prior infor- 
mation was necessary for treating the di — case and setting up 
optimal smoothing. Besides, at a closer look the smoothing proced- 
ure is questionable, because it introduces a hidden prior for a signal 
with a certain correlation length or power spectrum. But instead of 
introducing a hidden prior, one should rather be upright, make clear 
where the power spectrum for the signal enters the reasoning, and 
include the prior for s right from the start as we propose in the next 
section. 



3.2.2 Bayesian map-making 

Since a full posterior mean as proposed in |2.2| is not straight- 
forward for this problem, we have to rely on approximations for 
(*)(s|d)- possibility would be to sample the posterior via a 
Monte-Carlo Markov chain (MCMC) method, but this is compu- 
tationally very expensive and difficult to understand analytically, 
and therefore not suited for a proof of concept at which we aim. 
Instead we use the approximation of (s)j^|^j by the classical map 
as proposed in section [2!4] 

For this it is sufficient to minimize the probability Hamilton- 
ian of the problem defined by For our problem defined by the 
likelihood \13\ and the signal prior jTJ it is given by 

Hd[s] = -logP(d, s) = -log(P(d|s)P(s)) 

= 1/2 s^S-\s - ^ d« log p^ + J2^^^ + log(n '^"0 



1 t 



S — y~^(di iogPi — Pi), 



(23) 



where we have shifted the Hamiltonian by a constant in the last 
step. 

As the classical map rrici aims at minimizing the Hamiltonian, 
we can use efficient multidimensional minimization techniques. In 
particular, we use the conjugate gradient methocj^in the Broyden- 
Fletcher-Goldfarb-Shanno variant as provided by the GNU Sci- 
entific Librar^l^to minimize Hii[s]. 

The conjugate gradient method needs the derivative of the 
function to be minimized, so we calculate 



p J 5s 



(24) 



In the case where R is independent of s the gradient of p is 
rather easily computed 



— = Rife 6e 

OSk 



(25) 



® For an introduction to the method of conjugate gradients see |Shewchuk| 
(1994) 

*~TEe GNU Scientific Library is available from 
|http://www.gnu.org/software/gsl| 
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3.2.3 Error estimation 

The signal reconstruction alone is only of little use, as it con- 
tains no information on the reliability of the reconstruction. An 
approximation of the correct error intervals for rric; can be ob- 
tained via approximation of the posterior with a Gaussian. There- 
fore, we Taylor-expand the probability Hamiltonian up to second 
order in s and obtain Hd[s] ~ -ffo + — mc!)*D~'^(s — rrici). 
The Gaussian approximation of the posterior is then given by 
P{s\d) « C7(s — rrici, D). With the generating functional from 
lis} it is then easy to calculate the expected deviation from rrici 



{SSj: 5.3y) 



{s\d) 



(26) 



where 5s = s — rrici. One should keep in mind that this procedure 
gives only an approximation of the real one-a confidence interval 
due to the Gaussian approximation of the posterior and also does 
not display the cross-correlation between errors at different loca- 
tions. Visual inspection of the error bars obtained for our example 
figures shows that they are neither too large nor too small and are 
therefore a good approximation of the correct error bars. 

In our case where the Hamiltonian is given by ([23} one finds 



E 



U? SSm dSn 



■'. 5s, 



(27) 



where we have not implied any summation for clarity. 

If R depends on the signal we have to make the simplifying 
assumption that it varies only slowly with s such that its contri- 
bution to the gradient of /i can be neglected. The reasons for this 
simplification is mainly that for our specific R[s] -model developed 
in |3.6.1| this derivative is computationally extremely expensive to 
calculate, let alone the second derivative of R. Therefore, one gets 

OSm O-Smf'Sj^ 

which can be used in j27[ l. 



3.3 Photometric redshifts 

We apply our reconstruction algorithm to different cases, one of 
which is the reconstruction of dark matter density from galaxy 
counts whose redshift is determined photometrically. Photometric 
redshift measurement is based on the apparent colours c of galaxies 
rather than their full spectra, and schemes to assign a redshift z de- 
pending on the measured c. However, as Benftez ( 2000) points out, 
there is no unambiguous colour to redshift mapping even if more 
and more band filters are used. He argues that instead of a strict 
mapping z — z{c), one has to work with the full posterior P{z\c) 
that a galaxy with colour c actually has redshift z. 

Wittman^(,2009 ) treats this problem by drawing a random zmc 
from P{z\c) for each galaxy and continues his calculation with this 
Zmc- However, he is aware that this procedure only works if the 
number of galaxies per colour bin is large. So we choose a dens- 
ity field reconstruction from photometric redshift data as a testing 
ground for our method. 



3.3.1 Distortion matrix for photometric redshifts 

The distortion matrix for photometric redshift distortions maps 
from redshift into colour space and must be set up according to 
Rcz = P{c\z). The features we want to test for are asymmet- 
ric shape of R and the robustness to catastrophic outliers. As this 



- z = 0.125 
-- z = 0.250 
-- z = 0.500 




128 256 384 512 640 768 896 1024 



Figure 1. The probability distribution of observed colour c for three differ- 
ent redshifts. The x-axis is the number of the colour bin, the galaxy may 
be put into. Here wee use 1024 colour and redshift bins and the maximal 
redshift bin coiTesponds to ^ = 1. Note that the shape of the distribution 
is assumed to be independent of z and is only shifted towards higher col- 
our values. This is a simplification with respect to real photometric redshift 
probabilities, where the shape of the PDF does depend on 2. As we use a 
cyclic topology for this test-case, 2 = 1 corresponds to 2 = 0, and c = to 
c = 1024, respectively. Therefore, the catastrophic outliers for z = 0.500 
(dotted red) reappear at the low colour values on the left. 



is intended as a test-case, we make some simplifying assumptions 
about P{c\z). First, we assume that the shape of P{c\z) remains 
fixed when going to higher redshift and second we neglect the ef- 
fect of the spectral type on the colour PDF. Catastrophic outliers 
are modelled by a small Gaussian PDF contribution which is off- 
set from the main peak by half of the simulated interval length and 
whose height was chosen to be one tenth of the main peak. It should 
be stressed however, that once R is set up in a realistic way, no 
change is needed for the algorithm. In Fig.[T|we show the P{c\z) 
we use and it should be explicitly said that this is not a real-world 
P{c\z) but one that we designed to just look similar to a real-world 
P{c\z) as inO 



Benftez 



(2000 1. 



3.3.2 Reconstruction of redshift space matter distribution from 
colour space data 

For our reconstruction test we set up our simulation on an interval 
of length L = 1 split into 1024 evenly sized pixels. As we are not 
interested in boundary effects here, we set the window function to 
unity for the whole interval. 

In Figures|2]and|3]we show reconstructions of the signals with 
unit variance (si;)^^) ~ 1 from photometrically distorted data. In 
Fig.[3]we also show the characteristic residuals for this single sig- 
nal St when reconstructed by this technique, defined as 



Res 



(( 



md — St) 



'(d\st) 



(28) 



where rrid is the map as it is reconstructed from the data d. In prac- 
tice, we do this average for 500 different data realisations of the 
same signal st. This shows in which regions the algorithm gen- 
erally performs badly for a specific signal and where it does well 
independently from the data realisation. One may notice in Fig.|3] 
that in actual data realisations the galaxy numbers vary for differ- 
ent biasses while keeping pgai constant. This has to be expected 
because the average galaxy density is only the same for different 
biasses, when averaged over all signal realisations. This may af- 
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Figure 3. The top part of the nine panels show reconstructions of the same signal (dotted red) from data (grey dots) with Poissonian noise and typical spatial 
distortions as they occur in photometric redshift measurement. The data is where the point-wise inversion from l |21) would place it. The blue line shows the 
classical signal reconstruction as proposed by our method with its Ict en'or contours (thin blue lines). The bottom part of the nine panels show the characteristic 
residuals as defined by (28) for 500 different data realisations for this signal. 
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Figure 2. Reconstruction for a bias b = 1.5 and pg^i = 1000 of a signal 
(red dotted curve) from data (grey dots) with Poissonian noise and typical 
spatial distortions as they occur in photometric redshift measurement. The 
data is where the point-wise inversion from {21\ would place it and the thin 
evenly dashed black line shows the naive map rrina - The smooth blue line 
shows the classical reconstruction with the correct distortion matrix, the 
dash-dotted green line shows a MAP map without the distortion of 
colour space taken into account. Icr error levels are indicated by thin lines. 



feet the comparability of the characteristic residuals for different 
biasses, but not for the same bias and different galaxy densities. 

The first thing to notice about the reconstructions is that in 
all cases they are pretty smooth even though the noise in the data 
is considerable and also the original signal has far more power on 
small scales. This is against common knowledge that MAP maps 
pick up a lot of noise. What prevents this in our case is the distortion 
matrix which act as a smoothing operation on the observed data 
such that rrici itself becomes smooth again. 

In Fig.|2]we also show a classical reconstruction m"; that does 
not take the distortion of data space into account, i.e. it assumes that 
R = Compared to our full reconstruction nid one can see that 
neglecting the colour space distortion results in severe difficulties 
to get even the shape right. Instead of voids m"; even sees peaks 
at 0.4 and 0.65. Although large peaks are detected, their height 
is slightly underestimated because some events are scattered away 
which m"; is not aware of. The full reconstruction on the other 
hand can even reshape the void region located at 0.4, despite the 
abundance of events in that region originating from the large peak 
at 0.9. Looking at the characteristic residuals in Fig.[3]reveals that 
this is true for most data realisations. 

In Fig.[3]we show reconstructions of the same signal for differ- 
ent simulation parameters and their average residuals. In the recon- 
structions, one can make out two obvious trends for our simplified 
model: 

• The higher the galaxy density, the better the overall recon- 
struction (note in particular the decreasing level of Res'"*'). 

• The higher the bias the better becomes the reconstruction of 
peaks, but the worse becomes the reconstruction of void regions. 

Both observations are not surprising, since higher galaxy density 
means that the response is sampled with less Poissonian noise, so 
one expects the reconstruction to become better. Higher bias on 
the other hand sharpens the contrast such that peaks are selectively 
sampled with high accuracy whereas the galaxy density in void re- 
gions is reduced. This trend is also reflected by the error bars which 
tighten up in overdense regions for biasses larger than one, but not 
so for b = 0.5. 



Table 1. As an indicator for the quality of different reconstructions this table 
lists (||m — stllDj^^j, the average L2-distance from reconstruction m to 
true signal st, where the s-average runs over 500 random signal configur- 
ations. 
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The example for b = 0.5 and pgai = 500 shows that data vari- 
ance can make a big difference. Although the number of galaxies 
is nearly twice as large as in the panel to its left, the peak at 0.55 
is hardly detected at all. Looking at the residuals reveals that this 
is simply an artefact of a lucky data realisation for the low density 
case and an unlucky one for the middle density case. 

Looking at the characteristic residuals one notices, that for b — 
0.5 the general trend of the signal (i.e. the largest Fourier modes) 
can be detected even for very low galaxy densities, while sharp 
peaks and ditches are even difficult to resolve for pgai = 1000. For 
6 = 1.5 and b = 2.5 the overdense regions can be resolved even 
for low density of events, while the voids are difficult to reshape for 
low galaxy densities. 

It should be mentioned however, that the poor resolution of 
voids is also a problem of this specific example, because every 
void region has an overdense region that scatters events into the 
void (0.9 into 0.4 and 0.15 into 0.65). Had we chosen an example 
where two void regions scatter into each other, the resolution of 
these voids would be much better. 

In Table[T]we list the average residuals of 500 reconstructions 
of different signals. The naive map m„a is in all cases the worst 
reconstruction. Especially for low galaxy densities it is painfully 
close to the average residual of a zero map, which is expected to 
be unity for a Gaussian signal with unit variance. This tells above 
all one thing: for a reconstruction in the low signal to noise regime 
one must include some knowledge about the signal. For a low bias 
m"j and rrici have roughly the same error level when the galaxy 
density is low. When pgai rises, both maps become better but what 
surprises is that rrici seems to saturate over pgai > 500 at an er- 
ror level of ~ 0.15. Considering the fact, that the distortion of the 
signal inevitably destroys some information, the question arises, if 
this is already the optimum. Therefore, we let the same signals be 
processed without noise in the response, i.e. with perfect data and 
found for b — 0.5 that without Poissonian noise rrtd has an aver- 
age residual of 0.09 and m„a of 0.27. So we see that for the signals 
with pgai = 500 and pgai ~ 1000 there is still is some margin to 
be gained by even higher galaxy densities, but not much. It is re- 
markable, that rrina with noiseless data comes not even close to the 
performance of rrici with medium galaxy density. 

There is one anomaly worth noticing, namely the increase of 
the residuals from 6 = 1.5 to 6 = 2.5 observed for all maps but 
m"; in particular, which comes from catastrophic outliers scattering 
into void regions. Since count rates for b = 2.5 are only large at big 
overdensities and remain small even for smaller peaks, m"; makes 
a small peak from scattered events in voids. The same holds also 
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Figure 4. Classical reconstruction of a signal (red dotted line) from two 
independent data sets (data points not shown) for a bias of fe = 1.5, 
Pgal ~ '^'^^ ^^'^ Pgal ~ "^^^ reconstruction from photometric data 
alone is shown as a dark-green evenly dashed curve, the reconstruction from 
spectroscopic data as a green dash-dotted curve. The combined analysis of 
both data sets is shown as an blue smooth curve. The la error level of the 
combined reconstruction is indicated by the thin blue solid line. For the 
other reconstructions it was omitted for clarity. 



for m„a, but this map does not react as quickly on few events as 
m"; does, which explains the modest deterioration of m„a and the 
striking difference for m"; . 



3.4 Signal inference from independent data sets 

In real world surveys like SDSS there might be both available, 
accurate spectroscopic galaxy redshift and abundant less certain 
photometric redshift data. Can the former help to better localise 
the latter? 

Our formalism allows to combine both data sets by defining 



d= d' 



,(1) rf(2) 



R(^' R 



(2) 



(29) 



and business is as usual. 

In principle this allows to combine any two data sets, but 
now consider the case where d'^' are data from photometric red- 
shift measurement and d'^' from spectrosco pic red shift measure- 
ment. As such, we model R'^^ as in section 3.3.1 Since spectro- 
scopic redshift measurement is far more accurate than photometric 
redshift measurement, we assume spectroscopic redshift measure- 
ments to be exact, i.e. R'^^ = Ik with k being the zero-response 
{T2J. 

In Fig. [4] we show a reconstruction for one data realisation 
d'^' and d'^' with a bias b = 1.5. The combined reconstruction 



(s+ph) 
''cl 



mostly follows the reconstruction from spectroscopic data 
especially in overdense regions such as from 0.8 to 1.0 and 
0.5 to 0.6. This is an expected behaviour, since in this case 6=1.5 
and therefore overdense regions are accurately sampled even with 
the low average galaxy density pg^i = 60. And since the spectro- 
scopic data have no spatial error, they dominate the reconstruction 
in this regime. However, in regions where spectroscopic galaxies 
are rare, the combined reconstruction sometimes deviates substan- 

( s] 

tially from m^j (most conspicuous from 0.2 to 0.4). The crucial 
point to note is that the combined reconstruction is not an average 
of the two reconstructions ' and m^^'^^ as can be seen at 0.4 to 
0.5 and 0.6 to 0.7 where the combined reconstruction lies below 
the others. Therefore, the optimal combination of the two datasets 
is non-trivial and non-linear. 
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Figure 5. Reconstruction of a signal (red dotted line) from data (grey dots) 
that mimic the behaviour of coded mask apertures with a bias b = 2.5 and 
an average photon count rate fbyf^at = 500. The m^i reconstruction (blue 
smooth curve) shows only the largest peak of the signal, and some of its 
substructure, lir-error levels are indicated by thin lines. Being completely 
useless, the nai've map rrina was omitted. 



The statistics for 500 different signal realisations (not shown) 
confirm that the combination is more than just a superposition of 
the two single-dataset reconstructions. 

3.5 X-ray astronomy via coded mask telescopes 

Thanks to the generic structure of our approach, we can also ap- 
ply it to a completely different problem. One such problem is the 
detection of extended sources in coded mask aperture systems. 

Coded aperture systems were originally proposed by |Dicke| 
( |1968^ for the purpose of detecting point-like X-ray sources. In this 
scheme an absorbing plate with a pattern of pinholes is placed in 
front of a detector and the shadow of this plate on the detector al- 
lows with the knowledge of the plate pattern to unfold the count 
distribution and infer the positions of the X-ray sources. However, 
this technique becomes much more difficult, when the light sources 
are extended and not just point sources. 

We now demonstrate that it is in principle possible to map 
out extended sources with our method, when count rate and bias 
are high enough. Adapting our method to this problem, the mixing 
matrix in this case must be the coded mask pattern that lets light 
pass for open pixels and shields it completely for dark pixels. Hence 
we only need the pattern of a coded mask for our distortion matrix. 
Today's coded masks have an optimised pattern, but for our purpose 
the originally proposed random pattern ( |Dicke|1968[ l of blocking 
and transparent pixels is sufficient. 

In Fig.[5]we show two reconstructions on an interval with 512 
pixels from data obtained via a random pattern coded mask. In this 
set-up it is only possible to detect the largest peak, therefore we use 
a rather large bias of 6 = 2.5. It is remarkable that a surprisingly 
low number of photons Uphot = 404 is sufficient to detect the 
peak at position 0.25 in the top panel and even some bits of its 
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substructure. In regions where the algorithm can not see the signal, 
the error bar widens up to the interval +1 to -1. This is a good 
consistency check, because this is expected for a signal with unit 
variance when no information on the signal is available. 

The lower panel in Fig.|5]shows a reconstruction from a signal 
with an extremely high peak. Although such high peaks and count 
rates are quite rare, it is interesting how much details of the peak 
can be reconstructed with extremely small error range. Note that 
the smaller peak at 0.15 in this example - although comparable in 
size to the peak in the top panel - is not detected at all. This is due 
to the overwhelmingly larger brightness of the largest peak at 0.75 
whose photons hit the same detector, but with an approximately 



148 higher rate. The Poissonian noise from the larger 



peak overlays with the photons from the smaller peak, rendering it 
impossible to detect. 

This is however not yet a fully realistic set-up and further ef- 
fort must be put into refining this technique, as sometimes false 
peaks (with narrow error bars) turn up in the reconstruction. It is 
possible that this problem has already been amended by the inven- 
tion of especially designed coded masks which we did not consider 
here. 



3.6 Reconstruction with s-dependent distortion 

We now address the problem where the distortion matrix depends 
on the signal to be reconstructed. The paradigm for this is the meas- 
urement of distance via redshift. Due to the peculiar velocities of 
galaxies with respect to the Hubble frame the comoving distance of 
galaxies can not exactly be calculated from their redshift. Note that 
there needs not be a coordinate transformation from redshift into 
real space, because objects with different positions may have the 
same redshift. Therefore, the mapping from real to redshift space is 
not injective, thus not invertible. 

In particular, the gravitational pull of matter overdensities af- 
fects the peculiar velocities of galaxies. So if one wants to unfold 
the large-scale matter distribution in the universe by measuring an- 
gular positions and redshifts of galaxies, one has to take the dis- 
torting effects of matter on the redshift space into account. What 
makes this problem so particularly demanding is that the distor- 
tion operator, which transforms the real-space matter distribution 
into the observed redshift space matter distribution, depends on the 
real-space matter distribution itself which is to be reconstructed. 

One has to be aware that the assumption that the matter field is 
sampled by a Poissonian process will fail at some point when going 
to ever smaller scales. Here we show up problems one has to face 
even in an idealistic set-up where the forward transformation from 
real to redshift space is perfectly known. 



3.6.1 A statistical model for redshift space distortions 

We now address the forward problem of constructing the redshift 
space galaxy distribution from a known real space matter distribu- 
tion and try to keep our model as simple as possible. In the fol- 
lowing, "redshift space" will always refer to our model redshift 
space which we construct with the same characteristic features as 
the redshift space found in nature. To that we develop a simple 
model which bases on a few generic assumptions. We first distin- 
guish between two regimes: the linear and the non-linear regime of 
large-scale structure formation. 

The linear regime is dominated by the peculiar velocities from 
the bulk motion of matter in the direction of large-scale overdens- 



ities. The matter flow in this regime is described quantitatively by 
linear perturbation theory as 



(30) 



SaHQ 

where /(!^) = ^ and D+ is the linear growth 

factor of matter perturbations during the matter dominated era of 
the universe. The potential $ is determined by 



(31) 



where p is the mean matter density and G is the gravitational con- 
stant ^Mukhanov|2005 ' chapter 6). The first to point out the con- 
nection between linear perturbation theory of matter perturbations 
and redshift space distortions was Kaiser ( 1987), for an extensive 
review on the topic we refer to ,Hamilton,^1998^ . The bottom line is 
that linear distortions lead to an amplification of overdensities and 
a depletion of underdensities in redshift space. 

However, this relation can only be valid on large scales, be- 
cause small scale inhomogeneities rather collapse and form a grav- 
itationally bound object for which linear perturbation theory fails. 
Experimentally this manifests in the deviation of the redshift space 
matter power spectrum from the expected power spectrum of l inear 
perturbation theory for wave numbers k>0.15h Mpc~^ (e.g. Per- 



[cival et al.|p007) ; [Srmrh et al.|p003} ). Therefore, when we calciT 
late V from ( |30[ l, we first apply a tophat lowpass filter on the poten- 
tial $ and continue with \30\ . This way, only the largest modes of 
the linear velocity field are resolved which we use in our set-up of 
the distortion matrix R. 

The non-linear regime sets in when the perturbations in the 
matter distribution collapse and form gravitationally bound objects 
of numerous galaxies. In redshift space this presents itself as elong- 
ated structures along the line of sight which are called 'fingers-of- 
god' . As the gravitational force of any galaxy on any other in the 
superstructure are relevant, this is effectively an N-body problem 
which is known to behave chaotically. So in contrast to the linear 
contribution to the redshift space distortions, the non-linear contri- 
bution to the peculiar velocity cannot be explicitly given. It is there- 
fore in order, to resort to a statistical approach here, and we assume 
that objects in the non-linear regime are completely virialised and 
that their velocities have a Boltzmann PDF. 

From the virial law of classical mechanics ( [Landau & Lifshitz] 
|1966[ l we can obtain a relation between the gravitational poten- 
tial and the time averaged velocity: = — {^)f Assuming 
(<I>)j ~ $ yields a relation between the potential and the velocity 
dispersion. We now assume that the velocity PDF for the objects in 
a virialised system is given by a Boltzmann factor 



P{v„i) dv oc expl — 



dv. 



(32) 



which guarantees the right velocity dispersion. That non-linear ve- 
locity distortions are given by a Maxwellian distribution is a com- 
mon assumption used by many authors (e.g. [Peacock & Dodds 
T994i [Heavens & Taylor|1995||Tadros eraL][l999(|Tadros & Ef- 



stathiou|I996 1. The other school of thought models the non-linear 
velocity distribution as an exponen tial pairwise PDF, i.e. P 



(2VV)'^xp(-2V^|^|/a) (e.g. 



Bromley et al. 



1997 



Hamilton 



[T995l|Ratcirffe et al.|1998t . 

We now have to blend the two regimes into one general ex- 
pression. Therefore, we extend and make the ansatz 



P(«ll) oc exp - 



2F($) 



(33) 
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where F is a continuous function to be determined and is the 
component along the line of sight of the linear velocity field de- 
termined by l |30[ l. The function F must have the following limiting 
behaviour in order to interpolate between the linear and non-linear 
regime: 



lim = l$l/2 



lim F(^) 



0. 



Since F stands in the denominator, it must never become exactly 0, 
hence it must also not change sign. Apart from that, the Boltzmann 
factor should play little to no role in the linear regime where $ is 
above a threshold $o over which we do not assume objects to be 
virialised. In principle, we should therefore require that _F ~ in 
these regions, i.e. exact measurement. In practice on the other hand, 
every measurement comes with uncertainties and even in the linear 
regime galaxies can have unpredictable small peculiar velocities. 
Subsuming this as 'measurement uncertainties', it makes sense to 
require that > (To which includes this instrument noise nat- 

urally in our formalism. In other words, we turn the shortcoming of 
our method to model exact measurement into the feature to always 
include a minimal error with variance (tq. For the non-linear regime 
we assume that galaxies are only virialised with their potential ex- 
cess A$ = $ - $0, i.e. F(A<E>) = -A<&/2. 

One possibility to smoothen the transition from linearly to 
non-linearly dominated regions is by a tilted hyperbola such as 



;A$ + 2ag) 



+ r2 



A$ 



1 2 



(34) 



which has the advantage of only introducing one more free para- 
meter r which controls the smoothness of the transition. 

This allows to write the distortion matrix for the transforma- 
tion from real to redshift space as 

\2 



P{zi\xj, p) oc exp 



2F($, -$o) 



(35) 



Altogether we now have five free parameters to control the beha- 
viour of our self-built velocity dispersion model, namely 

• the strength of the linear velocity distortions, which is given 
for a cosmological model 

• the width of the tophat lowpass filter for the linear velocity 
field, which can be determined by analysing the redshift s pa ce mat - 

[Perc ival et al^ ( [2007[ l; 



ter power spectrum as A: < 0.15 /i Mpc ^ 
[Smith et al.|p003l > 

• T adjusting the smoothness of the transition from linear to 
non-linear behaviour 

• $0 for the zero-level of the gravitational potential under which 
non-linear effects kick in 

• and ao for any further measurement error. 

As we are aiming at a proof of concept, we also take the gravita- 
tional constant, which adjusts the strength of the non-linear effects, 
as a free parameter to generate visible effects with our distortion 
modeQ In Fig.|6] we show the mixing matrix as it was used in the 
reconstructions of the following section. 

As before, we use the conjugate gradient method for minim- 
ization of the probability Hamiltonian. This method requires the 
derivative of the Hamiltonian and therefore also the derivative of 



^ For completeness we give the numerical values of our settings: 47rGa^ = 
0.25, = 1.28, T = 5 • 10-4, (TO = 1-75 • lO"*, *o = and the 

tophat filter lets the lowest 1% of fc-modes pass. 




Figure 6. Example for a possible distortion matrix. Note that the maximum 
of the distribution is not necessarily on the diagonal due to the linear redshift 
distortions. The log-spaced contours indicate levels of equal height. On top 
the signal (solid red line) that generates this distortion matrix, the potential 
(blue dash-dotted line) and the streaming velocity resulting from linear the- 
ory (green dotted line) - all scaled to comparable amplitudes. Note that the 
strength of the linear displacement is slightly larger than in our simulation 
to make its effect better visible to the eye. 



R[s] with respect to s. However, for our model a direct calcula- 
tion of the gradient of R is not feasible, as it would demand the 
evaluation of N^^^ difficult to compute entries of R, which would 
pose strong limitations on the performance, as the evaluation of R 
already is the bottleneck of our algorithm. Therefore we make the 
approximation, that the contribution of ^ is small compared to 



- 1 



the other terms in the gradient of H, i.e. 

SH , fd 

5sk ' ' " 

'd 



Ssk 



(36) 



Although this is an approximation, a simulated annealing process 
started from the resulting map never finds a better minimum than 
this. Therefore we can safely regard the minimum obtained with 
the approximated gradient as the true minimum of H. 

Similarly, we are forced to make the same approximation 
when we approximate the error bars as already mentioned in |3.2.3| 

3.6.2 Results from the reconstruction of the real space matter 
distribution from redshift space data 

We set up a run of 500 signal reconstructions on an interval of 
length L = 1 with 1024 equally sized pixels. Due to our limited 
interval length we had to restrict ourselves to signals whose max- 
imum peak was less than 3.5, as for peaks higher than this value 
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Pgal = 250 pgal = 500 Pgal = 1000 




0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 

Figure 7. The top part of the nine panels show the real space reconstructions of the same signal (red dotted line) from data given in redshift space (grey dots) 
with Poissonian noise where the transformation from real to redshift space is given by \35) . The data is where the point-wise inversion from \2l) would place 
it. The smooth blue curve shows the classical signal reconstruction as proposed by our method with its Icr error contours (thin blue lines). The bottom part of 
the nine panels show the characteristic residuals as defined by l |28) for 500 different data realisations of this signal. 
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the tail of the smeared-out peaks overlap with the tail on the other 

sidtH 

For each signal realisation s we generate one possible data 
realisation d and do three different types of reconstructions, namely 
the naive map nina from |3.2.T| a MAP reconstruction neglecting 
redshift space distortions m"; and the full MAP map including the 
model redshift space distortions rrici. Fig. [7] shows our md recon- 
struction for one signal but for different h and pgai settings, along 
with the characteristic residuals Res'"*' for the reconstruction of 
the signal in the lower panels. For our model we find the following 
general trends 

• the higher the galaxy density, the better the reconstruction 

• the higher the bias, the worse the reconstruction of voids, but 
peaks on the other hand are better reconstructed 

The reasons for these trends are the same as in section [3.3.21 The 
reconstructions from galaxies with redshift distortions is therefore 
similar in many respects to the one with photometric redshifts. 

Note that the centre peak in the data (at 0.55) shows a clear 
dislocation from its signal counterpart in direction of the massive 
at 0.9 - an effect due to the linear velocity distortions. The md 
reconstruction however, being aware of the redshift distortions in- 
troduced by the large massive, places the peak at the right spot in 
all reconstructions. Interestingly, for large biasses this peak is well 
resolved even for very low galaxy densities as the characteristic re- 
siduals show. 

Yet there are limitations for the reconstructions as can be seen 
in the indentations flanking the largest peak at positions 0.8 and 
1.0. Although being comparable in size to the peak at 0.55, they 
are not resolved in any reconstruction, and this even remains the 
case for extremely large pgai which we do not show here. Ap- 
parently, these structures are irreversibly lost in the shadow that 
the larger peak at 0.9 casts on close-by smaller structures. This is 
also not unexpected, as the distortion matrix R is set up in a way 
that large peaks become smeared out over large distances, whereas 
small peaks remain localised. Therefore, the smaller peak appears 
as an extension of the plateau. The most important characteristic 
of this kind of error is that it does not improve with higher pgai in 
contrast to areas where more information can be gained by reducing 
the noise (e.g. wide void regions). 

In Figure [8] we show the centre panel from Fig. |7] with the 
reconstruction for 6 = 1.5 and pgai = 500 again but also the naive 
map rrina and the MAP map neglecting redshift space distortions 
m"; . At first sight, one notices the poor guess that m„a gives, which 
we take as an argument that Bayesian analysis is inevitable to tackle 
this problem. 

The more interesting competitor for the MAP map including 
the correction of redshift space distortions md is m"; . In general, 
the shape of the reconstruction is very similar which is as it should 
be, as both algorithms base on the same principles and work on 
the same data set. Yet there are some distinct features that make 
the difference. Most prominent is of course the correct treatment 
of linear redshift space distortions of nid in contrast to m", which 
sees the void from 0.15 to 0.3 further depleted and the peak at 0.55 
displaced towards the massive on the right. Another more subtle 
difference is that m"; picks up more small scale features from the 



As an estimate of how many signals this excludes, one can calculate 
+ erf^))^ 0.5% which excludes only 




Figure 8. Reconstruction of the same signal (red dotted line) as in Fig. |7] 
for an average galaxy density Pgai = 500 and galaxy bias b = 1.5. The 
data (grey dots) is where the point-wise inversion from \2\\ would place it 
and the thin evenly dashed black line shows the naive map m„a ■ In addi- 
tion to our redshift corrected map rrid (blue smooth curve) we also show a 
MAP reconstruction m"j neglecting redshift distortions (green dash-dotted 
curve). Note that m"^ and rrina pick up basically the same features, but 
responds much quicker to an excess of galaxies. Icr error levels are 
indicated by thin lines. 



Table 2. As an indicator for the quality of different reconstructions this table 
lists (||m — stllDfsjji the average L2-distance from reconstruction m to 
true signal St , where the s-average runs over 500 random signal realisations. 
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0.48 


0.42 


0.38 




m-d 


0.41 
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a minuscule subset of the possible signals. 



data as can be seen in region 0.8 to 1.1. This is due to the smooth- 
ing effect of R* on the rud reconstruction as already discussed in 
section r3.3.2l 

In contrast to the photometric case from section [J.3.2[ the er- 
ror bars do not tighten up in overdense regions. However this has 
to be expected, since our model was set up in a way, that the pos- 
ition uncertainty in the neighbourhood of large peaks is largest. In 
particular, this makes the detection of substructure in large peaks 
nearly impossible. 

For a signal-independent view on the reconstruction quality 
we list in Table |2] the average 1/2 -distance from reconstruction to 
true signal for 500 different signal reconstructions. 

The reconstruction benefit from including redshift space dis- 
tortions seems not to be overwhelming but tends to be larger if bias 
and galaxy density rise. Notably for m"; the effects from linear and 
non-linear redshift distortions partially cancel each other for the 
parameters chosen here and thereby improve the performance of 
m"; . This is because non-linear redshift space distortions smear out 
large peaks while linear redshift distortions compress large over- 
densities. If we turn off linear redshift distortions and set up R only 
with our virialisation model for non-linear redshift distortions, we 
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Figure 9. Example for a possible distortion matrix where the potential lead- 
ing to the non-linear redshift distortions is high-pass filtered. Note that the 
maximum of the distribution is not necessarily on the diagonal due to the 
linear redshift distortions. The log-spaced contours indicate levels of equal 
height. On top the signal (solid red line) that generates this distortion mat- 
rix, the potential (blue dash-dotted line) and the streaming velocity resulting 
from linear theory (green dotted line) - all scaled to comparable amplitudes. 
Note that in contrast to Fig.[6]the sharp peak at 0.55 is now a separate col- 
lapsed object with its own velocity dispersion. 

get the interesting effect that the average I/2-distance of rrici to st 
becomes smaller, but for m"; it increases instead. 

So far we have assumed that the underlying redshift distortion 
model is perfectly known and its parameters are the same both for 
data generation and reconstruction phase. In a set-up for measured 
data this is not the case. Neither is there a perfect model for the 
forward transformation from real to redshift space, nor are its para- 
meters accurately measured. How severely parameter errors affect 
the reconstruction has not been scrutinised, but can be aim of fur- 
ther investigation. Still, if this model was adapted and applied to 
measured data, the above mentioned reconstruction characteristics 
and limitations would hold. 



3.6.3 Refining the distortion model 

In section U.6. ll we have introduced a model for the transformation 
from real to redshift space in a statistical way. One important step 
was to apply a lowpass filter to the potential before the linear velo- 
city field was calculated. However, similarly to using only the low 
modes of the perturbation for calculating the velocity distortions, it 
would be reasonable to use only the high modes for calculating the 
velocity dispersion. 

This is because the non-linear velocity dispersion predomin- 
antly depend on small-scale physics. If a halo with many galaxies 
collapses, the energy that the galaxies gain from the collapse is only 



the difference from their former potential energy and the potential 
energy after they have entered the collapsed structure. The velocity 
dispersion is hence not proportional to the full potential, but only 
to the fraction of the potential that the galaxies have actually fallen. 
Assuming that the galaxies in the collapsed structure are all from 
the vicinity of the collapsed object, this fraction can be estimated 
as the high frequency component of the potential. In the picture of 
energy conservation one may look upon this as 

• the low frequency part of the potential adds to the energy of 
linear velocity distortions and 

• the high frequency part of the potential adds to the non-linear 
velocity component. 

In Fig. |9] we show the resulting distortion matrix of the modified 
model for the same signal as in Fig.|6] 

When we set up the velocity distortions this way, we find that 
the problem becomes severely harder and the MAP solution ulti- 
mately fails to give reasonable results. Interestingly it is the low 
bias case 6 — 0.5 which is hardest. This is most likely due to the 
fact that the response for b = 0.5 makes a double peak from large 
peaks, i.e. bifurcates the peak. This misleads the MAP map to make 
two peaks out of one, which can lead to very weird behaviour. This 
bifurcation will eventually also happen for the larger biasses, if we 
turn up the strength of the velocity dispersion. 

Our tests show, that the MAP solution via conjugate gradient 
minimization has severe problems with this bifurcation. We also 
employed a simulated annealing technique for minimization which 
gave us the same results. So we can say with good confidence, that 
in the case of a bifurcated response the MAP method may give 
the most likely map, but still a very bad reconstruction. With the 
complexity of the distortion matrix at this point we finally have 
reached a limit for the MAP method. 



3.6.4 Comparison to Metropolis-Hastings sampling 

The question now arises, whether the MAP method is already the 
optimal reconstruction given the data or not. According to the- 
ory, this should not be the case, because the MAP map minimizes 
(||m — St ||o)(g|^), but the Lo-norm is a rather distorted measure of 
distance. Unfortunately, the posterior is too complex that (s)(j|j) 
can be calculated directly. However, if there was a way to approxi- 
mate the posterior by another PDF, that is easier to evaluate, we 
may be able to calculate (s)^^!^) directly. Since our interest is not 
in the detailed shape of the posterior, but our aim is simply to eval- 
uate integrals over the posterior quickly, we may approximate the 
posterior P{s\d) by 

1 ^ 

Pis) = j^J2^('-''^ (37) 

i = l 

where we construct the Si using a Metropolis-Hastings MCMC 
method. In principle the chain can start from any map sq, but for 
the sake of skipping the burn-in phase, we start our MCMC from 
rrici. At any given point s„ we generate a small variation Ss with 
the same power spectrum as the signal, but with far smaller ampli- 
tude and set s„+i = Sn + 5s. We accept or reject this new sample 
according to the Metropolis-Hastings criterion (Binder 1997 , e.g. ). 
This gives us a chain of maps where consecutive samples are cor- 
related, but after some steps the correlation vanishes. 

Since a MCMC is expensive with respect to computation time, 
we can not run several hundred reconstructions but have to be sat- 
isfied with the evaluation of selected cases. Therefore, we show in 
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Pga, = 500, 6 = 0.5 




0.0 0.2 0.4 0.6 0.8 1.0 



Figure 10. On the left two reconstructions for different settings of the biasses b = 0.5 and b = 1.5. The signal appears as a red dotted line, the data (grey 
dots) are where the naive data inversion from would place it. The naive map (thin evenly dashed line) is the smoothed data and shows signs of bifurcation 
at 0.9. The m^i reconstruction (blue dash-dotted curve) interprets the bifurcation as two different peaks, while the posterior mean (smooth green curve) does 
not. Icr error contours are indicated by thin lines. The pictures on the right show the uncertainty covariance matrix of the Metropolis-Hastings reconstruction. 
Note the long-range influence of the structure at 0.9. 



Fig. [To] only three reconstructions without the characteristic resid- 
uals as we did before. As one can see there, the sampling method 
and the MAP method give very similar results in the region from 
0.2 to 0.7 where the redshift space distortions are comparatively 
small, although the posterior mean tends to be a bit better espe- 
cially in the b = 1.5 case. Similarly, the width of the eiTor bars is 
virtually the same in this region. 

But not so in the region from 0.7 to 1.^ There, the MAP 
approach is taken in by the bifurcation and gives a very bad guess 
from 0.7 to 1.1 where it places peaks instead of valleys and a valley 
instead of a peak. And what makes the situation problematic indeed 
is that the estimated error bars are rather small there. The algorithm 
therefore completely misjudges its accuracy. As already mentioned 
above, the situation improves a bit for the bias 1.5 but the problem 
is still there. 

The sampling method on the other hand does the right thing in 
the interval 0.7 to 1.1. It nicely reshapes the largest peak and at the 
place of the side-peak 0.75 which is in the 'shadow' of the larger 
peak at 0.9 it widens the error bar to remain on the safe side. If this 
is just by chance of unlucky data cannot be judged for sure at this 
point. For the larger bias 1.5 the improvement is not quite so good 
because the shadow on nearby structures is much stronger. 

^ In order to not mix up the ranges [a, b] and [b, 1] U [0, a] we will say 
[6, 1 + a] if we mean the latter 



Table 3. Z/2-distance of reconstruction to signal for three reconstructions 
two of which are shown in Fig. |10| Note that there is hardly any improve- 
ment of the classical map from Pg^i = 500 to pgai = 1000 for b = 0.5 
while the posterior average improves by about 40%. For the larger bias 
b = 1.5 the difference between m^i and posterior average becomes less 
palpably. 

Pgal = 500 Pgal = 1000 Pgal = 500 

b = 0.5 6 = 0.5 6 = 1.5 

\\m^l-st\\^ 0.42 0.41 0.24 

ll(^>{.|d) -''tlli 0-17 0.10 0.16 

The impression that the reconstruction quality of the average 
map from Metropolis-Hastings sampling is much better than the 
rrici reconstruction also manifests itself in the L2 -distance which 
we list in Table[3] 

Note also how the uncertainty covariance matrix in Fig. [TO] 
changes from b = 0.5 to 6 = 1.5. To understand what lies be- 
neath one has to look at the Gaussian approximation of the posterior 
P{s\d) ^ A/'exp(j*s-s'D-^s/2).HereAA = e-^'°^/V|2vrD| 
is a normalisation factor and j summarises all terms of the prob- 
ability Hamiltonian of first order in s. Direct calculation yields 

(^)(.M)^AA|osse^-' = -^'°"^/^ = Di. (38) 
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[EnBlin et al.| ( |2009) call j the information source because it excites 
the posterior mean from a zero-map to a non-zero state. One can 
show that in our problem j contains a term linear in d and additional 
terms. Therefore, equation \3S\ tells how the posterior mean reacts 
on additional data. Hence the pictures of the propagator in Fig.|10| 
show how the posterior map is constructed from the information 
source j and introduces a correlation length in the map. 

In void regions the correlation length is longer than in over- 
dense regions because the Poissonian noise is larger there. If there 
were no other source of uncertainty but the Poissonian noise, the 
band would be tightest in the most overdense regions. But there 
is an additional source of uncertainty, namely the velocity disper- 
sion from the redshift distortion model. Therefore, in overdense re- 
gions the correlation length of the propagator does not collapse, but 
shows a pattern of strong correlation and anticorrelation, as can be 
seen in the region from 0.7 to 1.2 in both cases. Comparing the cor- 
relation length to the distortion matrix in Fig.|9]shows that this is in 
good agreement to the correlation induced from the distortion mat- 
rix. The main difference in the correlation matrix of the example 
with b — 1.5 is that the contrast is sharper and that the diagonal is 
much more structured. 



4 CONCLUSIONS 

Many reconstruction problems in cosmology suffer not only from 
large noise but also from substantial measurement uncertainties. 
While it is possible that some measurement uncertainties will be 
ameliorated in the future by more sophisticated techniques, other 
sources of uncertainty are fundamental such as cosmic variance 
and galaxy redshift distributions. Areas where this applies are real 
space LSS reconstructions from galaxy counts in redshift space, 
but also consistent treatment of photometric redshift. For precision 
cosmology with galaxies it is therefore of paramount importance to 
incorporate these uncertainties in the analysis. 

Here we have presented a novel method how spatially dis- 
torted log-normal fields as they occur in density field reconstruc- 
tion can be reconstructed in a Bayesian way. This method was de- 
veloped in the framework of information field theory which we out- 
lined in section |2] We showed that the IFT moment calculation ul- 
timately foots on the minimization of the expected 1/2 -weighted 
error of the reconstruction. Where exact moment calculation from 
the posterior was not possible, we argued how the correct map - 
the posterior mean - could be approximated by a MAP approach. 

We developed a data model for a log-normal signal with 
Poissonian noise where the response can be non-local. We even 
allowed for the case, in which the distortion of data space could 
depend on the signal that was to be reconstructed. The resulting 
problem is so complex that it could only be solved approximately 
via numerical minimization of its probability Hamiltonian. 

For a test of our approach we performed simulations where we 
constructed mock signals, produced mock data thereof and tried to 
reconstruct the underlying signal by numerical minimization of the 
probability Hamiltonian. 

We tested our reconstruction code on three different distortion 
problems which were 

• data with typical distortions as they appear in photometric red- 
shift measurement, 

• coded mask aperture problems as they appear in X- and 7-ray 
astronomy and 

• real space matter reconstruction from redshift distorted data. 



For the latter we developed a model for the forward problem to 
construct redshift space data from real space galaxy distributions 
and where the distortion was dependent on the underlying mat- 
ter distribution that was to be measured. We were able to tackle 
this problem with a MAP approach. However, after further com- 
plication of the distortion operator we found that the MAP method 
does not live up to its expectations. Instead, we could show that ap- 
proximating the posterior via Metropolis-Hastings sampling could 
give much more accurate reconstructions. Therefore we think, that 
for such complicated problems the MAP method gives misleading 
results and should be superseded by more powerful however also 
computationally more demanding approaches such as sampling the 
posterior PDF. 

For the coded mask data we were able to identify the largest 
peaks and showed that it is even possible to reconstruct their sub- 
structure if the count rates are high enough. An application of this 
approach to real X- or 7-ray data should be possible but before do- 
ing so, some effort must be spent to make the approach robuster to 
false detection of peaks. 

At last, the reconstruction of a redshift space signal from 
photometric redshift data proved to be very fruitful. In many cases 
we were able to reconstruct the underlying matter distribution re- 
markably well. Since the colour space distortion is independent of 
the underlying signal, an application of our approach to large data 
sets is feasible. 

We also showed that in the IFT framework it is possible to eas- 
ily combine data sets with different error characteristics. We con- 
sidered the problem of combining photometric redshift data with 
large uncertainties and spectroscopic data that are very accurate 
in position. Our analysis showed that even with a low abundance 
of accurate data it is possible to improve the reconstruction from 
distorted data with large abundance as long as there is room for 
improvement. 

In all cases we found that Bayesian analysis of the problem is 
inevitable for the noise level we were considering. We also showed 
that the reconstruction becomes significantly worse when the data 
were distorted, but the data space distortion was neglected during 
the reconstruction. Therefore we think that including the data space 
distortions in future precision analysis is inevitable. 

Since the assumptions of our method are based on a few gen- 
eric principles we are confident that further areas will be found 
where our work will be appreciated. 



APPENDIX A: NOTATION 

Having to deal extensively with field variables, one needs a short- 
hand notation for the calculus. Hence we define a vector space 
whose elements are fields - in this sense we use the terms 'vector' 
and 'field' interchangeably. The dot product in this vector space is 
defined by 

(Al) 

Instead of writing the field variable in round brackets we usually 
use a notation where the variable is in subscript to make the corres- 
pondence to finite dimensional vector spaces more obvious. 

We also use tensors of higher rank over this vector space, most 
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notably matrices. Analogously to \Ai) , we define 

{Mv){x)= I dyM(x,y)v(y) 



(MN)(a;,y) = / dzM{x, z)N{z, y) 



and so on. 

We also need a field space equivalent of the frequently used 

rules 



d 



82 



a 



da 



^ ] a; j kj — ki 



For all practical applications this is enough as one deals only with 
a finite number of pixels, however all equations remain valid if one 
defines Ihe functional derivative according tO |Peskin & Schroeder| 
( [T995I chap. 9) as 



5.L 



■Jy = S{x - y) 



_5_ 



I 



dyJy^y = 



This blends in naturally with our definition of the vector product 
(|aT). 

Vectors are printed in normal font with indices omitted. 
Matrices are in bold font where possible, with the notable excep- 
tion of derivatives of vectors such as ^ . To avoid an ambiguity in 
the sequence of the indices we define for derivations that the new 
index shall be last, i.e. 



Ssi 



All functions are understood to act component-wise, that is 
f{v) should be read as f{v)i = f{vi). In particular, this also ap- 
plies to fundamental operations, such as multiplication and divi- 
sion: {v ■ ■w)i = Vi Wi and (v/w) = Vi/wi. Note also that we do 
not adopt Einstein's summation convention, such that an expression 
as Vi e^^ should in fact be read as the number Vi multiplied by the 
number e™' . 

In this notation subtle differences do matter, but it shortens the 
formulas considerably. Here some intricate examples: 

{v{Sw)). = Vi SijW.^ = V, SijW^ 
j 

{v W^)ij — ViWj 
(^S)ij = ViSij 

(t;*S)i = ^VjSji = v^Sj, 



APPENDIX B: GENERATING MOCK SIGNALS WITH 
GAUSSIAN COVARIANCE MATRIX 

The aim is to generate signals with Gaussian covariance matrix 

(ss*)(.)=S. (Bl) 

By this definition, S is symmetric and positive definite such that it 
has a root, i.e. there exists a matrix T such that S = T*T. We now 
show that the mock signal can be generated by convolving a vector 
of Gaussian random numbers r with T: s = Tr 

Without loss of generality we restrict our reasoning to the case 



where the Gaussian random numbers in r have unit variance, such 
that the prior for r is 



l27rlli/2- 



(B2) 



Here the product runs over all pixel indices. From r we construct 
the signal vector s by application of T, which can be understood as 
a smoothing procedure for the random values. Note also that only 
in this step different points of the signal become correlated with 
each other - the entries of r are by virtue of the random nature of 
its entries uncorrelated. 

We now have to prove that this procedure gives the desired 
prior for s: 



P(s) = / Vr5{s - Tr) • P(r) 



Vq / Vr 



\2nl\ 



-iq^is-Tr) e 



V2 



|27rlli/2 



1 

|27rl| 
1 



Vqe 



s/2 



= e(s, S) 



|27rS|i/2 

As source for random numbers we use the 'Mersenne Twister' 
random number generator which offers a pseudo random number 
sequence with a period of 2^ 



1 (Matsumoto & Nishimura 



|1998[ l. Its advantages are speed and very good randomness. In par- 
ticular, we use its implementation from the GNU Scientific Library 

(gsO- 

Note that this procedure gives an easy and robust way to test 
whether the constructed signal has indeed the desired covariance 

where we have used that r is a vector of Npix Gaussian random 
numbers of unit variance in the last step. In other words, for a 
Gaussian random field s the number s*S^^s should roughly give 
the dimension of s. 
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